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Abstract-This paper proposes a method to solve multivariable 
nonlinear optimization problem based on MCMC statistical 
sampling. Theoretical analysis proves that the maximum statistic 
converges to the maximum point of probability density. From 
this convergence property, we establish a newly algorithm to find 
the best solution of an optimization through largely MCMC 
sampling in condition that the exact generalized probability 
density function has been designed. The MCMC optimization 
algorithm needs less iterate variables reserved so that the 
computing speed is relatively high. Finally, we build a Markov 
chain model to demonstrate the dynamic movement of ladderlike 
labour transfer with multiparamter, and we practice this 
MCMC sampling optimization algorithm to find parameters 
estimations. The optimization results are reasonable and from 
which we obtain the short-term and long-term forecast of labour 
movements. 
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I. INTRODUCTION 

Statistical computing is an active frontier in modern 
statistics which focus on the application of calculation 
technology in statistics. In many complex stochastic problems, 
the theoretical probability is difficult to obtain directly 
because of the complexity of random factors. The solution is 
often replaced by the corresponding probability sample 
frequency analysis from computer simulation, which is known 
as the Monte Carlo method. The conditional distribution 
method is based on Markov chain samples generated 
indirectly when confront sampling from the complex 
distribution. It generates random samples whose are not 
independent, the new sample is correlated with previous one. 
It can be shown that on the ergodic condition, the limiting 
distribution of this Markov chain is exactly the distribution we 
intend to sample. This method is known as Markov Chain 
Monte Carlo (MCMC) algorithm. 

Metropolis-Hasting[l] algorithm is the first application of 
a MCMC sampling algorithm which is mainly used for 
simulation and sampling of single dimensional random 
variable. Gibbs sampling[2] is also a common and 
representative MCMC algorithm which is mainly used for 
multi-dimensional random variables sampling. 

MCMC algorithms have been widely used in statistical 
inference for many areas. In 1998, Brooks[3] discussed the 
continuous and discrete variables MCMC sampling 
techniques systematically, and he found answer for the 
iteration problem of stationary distribution when the MCMC 
algorithm is expanded for the Baysian parameter estimation 
and figure computing applications. In 2003, Andrieu[4] made 
a further study on machine learning and parameter inference 



with this random sampling method. MCMC algorithm also 
has important applications in the measurement of 
management such as time series model selection[5], Bayesian 
statistical inference[6], measuring the model parameter 
estimation[71 etc. Currently when we apply MCMC algorithm 
to estimate parameter, the common method is frequency 
consistency by law of large numbers and estimate the 
distribution expectation by the sample average, but for other 
sample statistics such as maximum, median, etc. , there are 
few application. 

Multivariable nonlinear optimization problem is hard to 
find the best solution. The traditional methods include the 
genetic algorithm[8], the ant colony algorithm[9] , simulated 
annealing[10] and so on. Only a few works involved with 
MCMC algorithm in optimization problem. In 1995, Eyerfll] 
applied MCMC sampling to simulated annealing search 
design. He simulated annealing random variation of process 
and control chain convergence to optimal solutions. But 
Eyer'sfll] method still focuses on the purpose of 
convergence, it accept poor disturbance in certain probability 
and take the final convergence result as the best solution. 
Because the selection of parameter annealing has significantly 
impact on the solution, and the rule of "probability accept" 
may results in the current state less than some solutions 
among the search trajectory, It is difficult to obtain global 
optimization solution. Even in some circumstance, the final 
algorithm approximate optimal solution is worse than the 
solution in search trajectory. 

The rest of this paper will be organized as follows: In 
section II, we will introduce the tradition methods of MCMC 
sampling, that is, Metropolis-Hasting Sampling Algorithm 
and Gibbs Sampling Algorithm. In section III, we will 
propose a computation statistics algorithm for multivariable 
and nonlinear optimization problem by the large sample 
properties of MCMC statistical method. This newly thinking 
for MCMC application includes the design of generalized 
density function, using of Metropolis-Hasting and Gibbs 
sampling methods, calculation the maximum statistic. At last, 
we will show the application of our MCMC Sampling 
algorithm to solve the real problem in section IV. 

II. MCMC SAMPLING ALGORITHM 

A. Metropolis-Hasting Sampling Algorithm 

The basic idea of MCMC methods is to construct a 
Markov chain with the specified stationary distribution, 
namely n (x) , then run the chain with full length till the 
sample chain value close enough to its stationary distribution. 
Then take stationary chains as the samples of n ( x ) and 
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make variety of statistical inference based on these samples. 
The first MCMC sampling method is Metropolis-Hastingfl] 
algorithm, which means sampling starts from another easily 
known reversible Markov chain Q, and obtain the new 
Markov chain by comparing. Suppose q(i, j) indicated that the 
transfer from the state i to j , and note x the initial point, 
then this algorithm is as follows: 

1) Drawn y from the known distribution q(x,y); 

2) Drawn u from uniform distribution U(0, 1) 

n(y)g{y,x t ) 
n{x t )q{x„y) 



3) Compute r(x t , y) = min < 1 



4) x M 



y u < r 



u > r 



Repeat 1) to 4), Hastionsfl] had proved that the transfer 
accordance with 1) to 4) is reversible Markov chain has 
stationary distribution n ( x ) . In particular of convenience, 
the transition probability q (i , j) can be a special conditional 
independence case, that is, the transition probability only 
relevant with j. Here r (x„ y) can be simplified as 

x(j)q(j) I 



min-M 



' 7v{x t )q{x t )\ 



B. Gibbs Sampling Algorithm 

Gibbs sampling algorithm was first proposed by Geman[2] 
in 1984, it is a special case of Metropolis-Hastings sampling. 
Gibbs sampling method is to separate the vector space into 
multiple parts and each part of the transformation to move 
forward by sampling. Gibbs sampling algorithm is suitable for 
high-dimensional random variables especially. 

The Gibbs algorithm is as follows: Assumed 
X=(X 1 ,X 2 ,...Xi i ) to be k-dimensional random variables with 
joint distribution n, ti; (k, (X i /Xi,..,X i . 1 ,Xi+j..,Xi l ) is the 
conditional distribution of X-,. Gibbs sampling is to generate 
random samples Xi according to the distribution 7ij with 



condition (X!~ l ,X' 2 



.,X'- 1 ), where 



X[ ~ n t {X t | X[,...,Xl x ,X , i ;[...,X , k - { ) , i = 1,2,...,* 

Then we obtain (X''\x^'\... ,x' t ~ l ) , and that is the whole 

process of Gibbs sampling. Gibbs sampling method converts 
high-dimensional distribution sampling into one-dimensional 
conditional distribution sampling. With iteration, the new 
high-dimensional samples (X'' 1 ^''^ ,...,X'~ X ) are random 

sampling from distribution n , which avoid the difficult to 
sample high dimensional distribution directly. 

III. MCMC STATISTICAL COMPUTING OPTIMIZATION 
ALGORITHM 

In a probability distribution, the largest density area is 
mostly tending to be sampled. So the sampling density 
function should converge to the maximum point of maximum 
probability if the sample is sufficiently large. That establishes 
the link between the function maximum value and sampling 
extreme statistics. 

A. Maximum Statistics of One-Dimensional Density 

Note x a maximum point of a continuous 
density /(x) , x e R , that is/(x ) = max/(x) . With the idea 



of likelihood, (x l ,---,x n ) , the n-times the sample from f(x) , 
take x* , which is f(x') = max(/(x,),---,/(x n )) ,as 
estimation of x . Make F x (x) the distribution function of 
f(x), X ] ,---,X n is simple sample and note Y = f(X) , that 
Y/ =f(X i ),i = 1,2, •••,« are independent and identically 
distribution. We have 

F r (y)=P{f(X)<y}= j f(x)dx 0<y<f(x ) (1) 
f(*)<y 

By the continuous density f(x) , F Y {y) is increasing 
strictly on 0< y<f(x a ) , and F r (0) = , F r (f(x Q )) = l . 
Take 7* = f(X') = max (/(X, ),•••, f(Xj) , we have 



F r (y) = [F Y {y)]' , 0<y<f(x ) 



(2) 



So when n — > co andO<j</(x ) , then F, (y)—>0 ; 

when y = f{x a ) , then F y > (y) — > 1 . That is, Y* convergence to 

f(x ) in probability and estimator x* convergence x in 
probability equivalently. 




Fig. 1 Sampling from probability density function 

B. Maximum Statistics of Multi-Dimensional Density 

Set x a maximum point of a continuous 
density f(x) ,xeR". The n-times the sample (x, , • • • , x n ) from 
f(x) , take x*, which is f(x*) = max(/(x,),---,/(x B )) , as 
estimation of x . Note Y = f(X) , then 



F Y {y) = \..]f{x)dx, 0<y<f(x ) 



(3) 



f(')<y 



By the continuous density f(x) , F Y (y) is increasing 
strictly on 0<>></(x ) , and J F ! ,(0) = , F Y (f(x )) = \ . 
Take^,* =/(X*)=max(/(X 1 ),---,/(X„)),wehave 

F Y: (y)=[F ¥ (y)]' , 0<y<f(x ) (4) 

Therefore when n^>co and 0< y < f(x a ) , then 
F Y . (y) ->• ; when y = f(x ) , then F y , (y) -> 1 . That is, Y* 
convergence to f(x ) in probability when n — > oo and 
estimator x convergence x in probability equivalently. 
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Similar conclusions can be deduced to one-dimensional 
and multi-dimensional discrete distribution. 

C. Generalized Probability Density Function Design 

Suppose the variables space of an optimization problem is 
n-dimensional feasible region x e , and the objective 
function is maxg(x). 

(1) If g(x)>O,xe0, note the generalized probability 
density function as 



n{x) ■■ 



g(x) 



j g(x)dx 



xe© 



(5) 



n{x),x e © can be considered as a generalized probability 
density function and its maximum point corresponds to the 
maximum point of g(x) . Thus through the previous analysis, 

x* will continue to close and converge to the maximum point 
of n{x) in distribution with the larger sampling. That is 

MCMC statistical computing optimization algorithm to solve 
multivariate objective function optimization problem. 

(2) If g(x) > 0, xe® is not satisfied, the generalized 
probability density function can be transformed as 



„i'(v) 



n{x) ■■ 



j" e g{x) dx 



, x e 



(6) 



(3) If the objective function isming(x), the generalized 
probability density function can be transformed as 



-i'(-v) 



71 (x) ■■ 



j e- g,x) dx 



, x e 



(7) 



D. MCMC Optimization Algorithm Design 

Now the difficulty is that multivariate distribution 
ti(x), x e is relatively complicated and it is not easy to 
practice sampling. While multi-dimensional distribution of 
random sampling with MCMC method has better adaptability, 
the sampling problem can be easily removed. 

Suppose we have the probability density function (5), the 
other two have almost the same induction as follows. Design a 
Markov chain with stationary distribution is n(x), x e , then 
the maximum point in finite sampling from distribution nix') 
will be sufficiently close to the maximum point of g(x) in the 

feasible region, in which state x e of the Markov chain is n- 
dimensional vector. 

First let T(x) is the uniform distribution on x e , and 
sampling from T(x) is through Gibbs sampling. Because there 
have many restrictions on feasible region, the region x e is 
an irregular n-dimensional area and its samples can be 
generated through Gibbs conditional distribution. Then, since 
T{x,) = T(y) , by the MCMC algorithm we have 






'*(*,)} 



The optimization algorithm is: 

1) Drawn y by the Gibbs method from the distribution 

T(x); 

2) Drawn u from the uniform distribution £7(0,1) 

g(y) \ 



3) Compute r(x, , y) = min <j 1, 

4) Take x t+ 



y u<r 
x, u > r 



5) Take x l+l 



x, g(x, )>g(x, +1 ) 
*m g(x t ) < g(x t+l ) 



Repeat 1) to 5). If the iteration times are large enough, then 
x n will convergence to the maximum point of the objective 
function g(x) in distribution. 

We can see from the problem analysis above that the key 
points of MCMC statistical computing method are designing 
of general probability density function and uniform sampling 
from conditional constraint region. 

IV. Numerical Example 

In current western china, the surplus labour force transfer 
becomes the key point to achieve the balance between urban 
and rural. Several researchers have theoretically investigated 
the surplus labour force transfer from other perspective. 
Taylor and Williamson[12] investigated the labour force 
transfer from the perspective of population age structure in 
1997, and in 2005, Temple[13] from the perspective of 
agricultural economic development. Fields [14] considered 
the labour immigrant for labour remuneration compensation, 
and Carrasco et al. [15] in 2008 studied the labour market 
effect of immigrant for employment rates and wages. All of 
these studies are qualitative but not quantitative analysis 
especially for prediction. In the following we will propose a 
Markov chain to express the rural labour transfer 
stochastically. 

A. Model of Ladderlike Labour Transfer 

Ladderlike labour transfer mode is developed in the work 
of balance urban and rural in Chongqing, a young 
municipality in western china. It means in a county region, the 
rural labour in remote countryside free themselves from 
traditional farming operations and transfer to new rural 
residential point, small town, and city to work in secondary 
and tertiary industries gradually. 

By considering labour as the transfer variable, different 
terrain shift of labour employment region, such as countryside, 
residential point, small town, city as the variable state, we 
build a labour transfer model. In view of the labour in each 
state could transfer other places out side this county region, 
we propose an additional variable to present it. We denote the 
five states as 1,2,3,4 and 5 individually. From Fig. 3 we can 
see the transfer movements between each state. Labour may 
stay at the same state or transfer to lower ladder state. State 
1,2,3,4 may turn to 5 but state 5 can only turn back to 4 
because those guys work in huge cities unwilling to go back 
rural again. The ladderlike transfer process is not always in 
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the same direction, but there have some reciprocating mobility 
to be a complex dynamic process. 




Countryside 



Residential 
point 





Small town 



City 



Fig. 2 The different ladders of the labour transfer 




Fig. 3 The state transition diagram of labour transfer 

Thus a discrete homogeneous Markov chain is built, and 
the labour transfer is determined by the transition probability 
matrix p = (p..) . 



P = 



Pu 


Pu 






Pis 


Pll 


Pn 


Pi:, 




Pis 




Pn 


Pn 


Pl4 


Pis 






P41 


P» 

P54 


P45 

Pa 



(9) 



B. MCMC to Estimate the Transition Probability Matrix 

Let IT. = (x n ,x n ,---,x ls ), i = \,2,---,n be the n steps 
actual transfer observation samples, by observational error s , 
there have 



n 



I\P + s, i = \,2,--,n-\ 



(10) 



To estimated transition probability matrix P is a 
continuity optimization problem. Although estimation P can 
be regarded as a linear model, but P has a restriction that 
summation of each row is 1. Therefore, the optimization 
cannot be solved by traditional method. The parameter 
solving can be regarded as 1 6 variables optimization problem, 
namely P = (p„) should satisfy 



in£(n,. +1 -n iJ p)(n i+1 -n. J p) r 




i,jeS 
i e S 



(11) 



On the basis of the MCMC algorithm design, the first step 
is application Gibbs sampling multivariate 

(Pn>Pv3.> — >Pn>PzL,'">P2,>P s i'"Pn) with the condition 
P = (/>..) by uniform sampling. Design sampling method as: 
start from {T n ,T u ,---,T u ,T u ■■-,T 2s ,T sl ---TJ , which 



t/(0,l) 



And normalize it p r 






(Pu>Pi2>--->Pu>P2i,--->P2,>P,i---Pa) was a qualified 
distribution. By the Gibbs sampling method, sample 
T v ~U(0,l) from (T u ,T u ,--,T u ,T 21 --,T 2s ,f l --TJ and 

normalized successively. Then obtain a new eligible 
(Pn > P12 > ' • • > Pu > P21, ■■•>P2s> Psi ■ ■ ■ Pss ) sample. At last, sample 
largely by MCMC algorithm and record the maximum 
statistic. From the theoretical analysis we know it will 
converges to the optimum of the objective function. 

We use the original labour distribution sample 
Yl ! =(x n ,x i2 ,---,x ls ), i = \,2,---,n from 2001 to 2011, 

sample iterate from the generalized density function using 
MCMC statistical optimization algorithm to obtain optimized 

parameters estimation P . 



0.782 0.079 0.138 

0.051 0.457 0.295 0.196 

0.108 0.417 0.316 0.158 

0.244 0.610 0.146 

0.549 0.451 



(12) 



From the estimation P we can see that p n is 0.782, 
which indicate the labour in countryside remain unchanged at 
the probability 0.782, and p l5 equals to 0.138 means the 
probability of go out of the county to work is 0.338. From /> 23 
equals to 0.295 we can see labour in resident transfer to small 
town at probability 0.295, which means these labours have 
adapt to urban life and accelerated the transfer consciously. 
p 2l equals to 0.051 indicate there still have one percent rural 

labour return back to countryside because they are not suited 
the life in residential point. About 0.601 parts of labour in city 
remain unchanged, which includes the regular citizen. 
Probability of /> 35 is 0.158 represent those who work and 

settle down outside the county. We can see the estimated 
parameters in transition probability matrix have characteristic 
of ladderlike transfer, so the MCMC estimation is reasonable. 



C. Prediction of Labour Transfer 

Headings, or heads, are organizational devices that guide 
the reader through your paper. There are two types: 
component heads and text heads. 



From (10) we count n iteratively 3 times to obtain the 
short term prediction of labour distribution in the next 3 years. 

TABLE I 

THE SHORT TERM PREDICTION OF LABOUR PROBABILITY DISTRIBUTION IN THE 

NEXT 3 YEARS 



Year 


Countryside 


Residential 
point 


Small 
town 


City 


Outside 
county 


2012 


0.406 


0.153 


0.182 


0.237 


0.221 


2013 


0.325 


0.122 


0.179 


0.323 


0.249 


2014 


0.261 


0.101 


0.190 


0.391 


0.257 



We can see from Table I that the rural labour will transfer 
to small town and cities continuously in short period, the 
proportion of labour in countryside will gradually decrease 
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but the decrease rate will slow down. The labour proportion in 
outside the county keeps increasing and the proportion in 
residential point fluctuates slightly. 

We also obtain the stationary distribution of this Markov 
chain as the long term prediction of each state, which is 
0.0125, 0.0476, 0.2302, 0.4926 and 0.2174. We can see in 
current policy and economic circumstance, the labour in city 
are in the majority, there only leave about 0.0125 proportion 
labour stay in countryside in the far future. That shows if the 
local government keep the current economic and policy 
unchanged for a long time, city-countryside dualization can 
eliminated. 

V. CONCLUSIONS 

The algorithm based on MCMC sampling method of 
statistical optimization is demonstrated to solve multi-variable 
nonlinear optimization problem. The key points of the MCMC 
optimization algorithm are the generalized probability density 
function design and Gibbs sampling algorithm from feasible 
region. MCMC optimization algorithm does not exist local 
optimal solution problem, and theoretically ensures the global 
optimal solution can be obtained if sampling large enough. 
We practice the MCMC optimization method to deal with 
parameter estimations for labour transfer model. Those 
estimations seem reasonable and from which we achieve a 
short-term and long-term forecast for labour circumstance. 
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